Some common deleterious mutations are shared in SARS-CoV-2 genomes from deceased COVID-19 patients across continents

The identification of deleterious mutations in different variants of SARS-CoV-2 and their roles in the morbidity of COVID-19 patients has yet to be thoroughly investigated. To unravel the spectrum of mutations and their effects within SARS-CoV-2 genomes, we analyzed 5,724 complete genomes from deceased COVID-19 patients sourced from the GISAID database. This analysis was conducted using the Nextstrain platform, applying a generalized time-reversible model for evolutionary phylogeny. These genomes were compared to the reference strain (hCoV-19/Wuhan/WIV04/2019) using MAFFT v7.470. Our findings revealed that SARS-CoV-2 genomes from deceased individuals belonged to 21 Nextstrain clades, with clade 20I (Alpha variant) being the most predominant, followed by clade 20H (Beta variant) and clade 20J (Gamma variant). The majority of SARS-CoV-2 genomes from deceased patients (33.4%) were sequenced in North America, while the lowest percentage (0.98%) came from Africa. The ‘G’ clade was dominant in the SARS-CoV-2 genomes of Asian, African, and North American regions, while the ‘GRY’ clade prevailed in Europe. In our analysis, we identified 35,799 nucleotide (NT) mutations throughout the genome, with the highest frequency (11,402 occurrences) found in the spike protein. Notably, we observed 4150 point-specific amino acid (AA) mutations in SARS-CoV-2 genomes, with D614G (20%) and N501Y (14%) identified as the top two deleterious mutations in the spike protein on a global scale. Furthermore, we detected five common deleterious AA mutations, including G18V, W45S, I33T, P30L, and Q418H, which play a key role in defining each clade of SARS-CoV-2. Our novel findings hold potential value for genomic surveillance, enabling the monitoring of the evolving pattern of SARS-CoV-2 infection, its emerging variants, and their impact on the development of effective vaccination and control strategies.

sequence (WGS) with high read coverage (> 29,000 bp) from the global initiative on sharing all influenza data (GISAID) up to February 2023.After a thorough filtering of these genomes, 5724 complete genomes belonged to COVID-19 deceased patients from different demographics were selected for further analysis.These WGS data (n = 5724) comprised SARS-CoV-2 genome sequences from 123 countries and five continents (e.g., Asia, Africa, Europe, North America and South America) of the globe.The geographical distribution of the SARS-CoV-2 WGS from deceased COVID-19 people reveals that 33.4% of the genomes were sequenced from North America followed by 28.8% from Europe, 19.9% from South America, 17.7% from Africa, and 0.7% from Asia (Fig. 1A).
The "G" clade of the SARS-CoV-2 was found to be predominated in the deceased COVID-19 patients of the Asian, African and North American regions while most of the death cases in Europe were registered with "GRY" clade.In contrast, most death cases were registered for "GR" clade in the South American continent (Fig. 1F).By looking at the cumulative death cases registered throughout study period (from January 2020 to February 2023), we found that most of the WGS data (approximately 1000 sequences) from deceased COVID-19 patients were sequenced from the North American residents.Though all the countries had submitted their WGS irrespective of (G) Cumulative number of strains retrieved during the time frame from five continents.regional barrier, African regions are found as the lowest possible data generating zone from deceased COVID-19 patients (less than 200 sequences) (Fig. 1G).Relevant demographic and medical data are described in Data S1.
Another notable finding of this study is the prediction of the NT alterations in the SARS-CoV-2 genomes of various nations.We compared the NT mutational spectra in top 12 countries (Table 1), where the highest number of COVID-19 associated deaths were reported.The frequency of the NT mutations at D614G position in the S gene was prominent in 12 nations with the most significant incidence of COVID-19 deaths (Table 1).Similarly, the maximum number of NT mutations were identified in T1001I, G204R, E1264D, P314L, E92K, Q57H, Q77E, S5L, L29F, S41P, and T40I positions of SARS-CoV-2 genomes.The SARS-CoV-2 genomes sequenced from deceased COVID-19 patients of the USA showed maximum NT mutations at I2230T position in ORF1a, D3L in N, P218L in ORF1b, Y73C in ORF8, Q57H in ORF3a, P10S in ORF9b, A43S in ORF7a, T175M in M, E13D in ORF6, T40I in ORF7b, and P71L in E genes.In contrast, genomes sequenced from India showed the highest NT mutation frequency at D614G in S gene followed by T1001I in ORF1a, G204R in N gene, K1383R in ORF1b, R52I in ORF8, P42L in ORF3a, Q77E in ORF9b, N38T in ORF7a, L87F in M gene, I14T in ORF6, S5L in ORF7b and T9I in E gene (Table 1).These findings imply that while some SARS-CoV-2 NT mutations were responsible for its evolution, a few may benefit viral adaptation in a specific demographic distribution.Variations in NT mutation patterns in SARS-CoV-2 genomes may be attributable to population age distribution, gender, host immunity, and socioeconomic level.

Point-specific amino-acid mutations in SARS-CoV-2 genomes of the deceased COVID-19 patients
To identify deleterious mutations in the SARS-CoV-2 genomes, we analyzed point-specific amino acid (AA) mutations in the genomes of this virus obtained from deceased COVID-19 patients using SIFT, PolyPhen-2, SNAP2, PROVEAN, PredictSNP, and MAPP web-based tools.Deleterious mutations were critically analyzed and cross-checked using these tools.A threshold value of − 2.5 was determined to ensure highly balanced accuracy in defining the deleterious mutation.Therefore, mutations having a value smaller than − 2.5 were identified as deleterious 33 .Among the AA mutations identified, the number of deleterious and non-deleterious www.nature.com/scientificreports/mutations were 951 and 3199, respectively.The highest number of deleterious AA mutations were found in the ORF1b (n = 338) followed by ORF1a (n = 236), ORF3a (n = 122).Besides, 49, 45, 42, 40 and 30 deleterious AA mutations were predicted in the N, ORF8, ORF7a, S and ORF9b segments, respectively.In this study, the open reading frames (ORF) of the SARS-CoV-2 genome possessed a higher percentage of deleterious AA mutations than other segments.As for example, the ORF3a, ORF6, ORF7a, ORF8 and ORF9b harbored > 50.0%deleterious AA mutations.However, rest of the segments of the SARS-CoV-2 genomes fewer mutations (< 30) (Table S1).
The overall AA mutations detected in the spike protein of the SARS-CoV-2 genomes of the deceased patients are shown in Fig. 4A.In this study, the highest frequency of AA mutations (31.85%) was recorded in the S gene, which is responsible for viral pathogenicity.The S gene of the study genomes underwent AA mutations at 32 sites (Fig. 4A).Fourteen of these AA mutation sites such as V3L, L5Y, L10S, S13L, T19R, P26L, D401, S60A, P82AT, V1201Y204R, S2051, L2231, Y2651 were predicted in NTD (N-terminal domain) fragment, while eight of them (e.g., Q314, G339D, S371F, S373P, F377Y, D405N, K417N, L452R, T478K, and E484Q) were found in the RBD region.The remaining eight sites such as A570D, D614G, P681R, N764K, D796Y, N856K, R1000L, and E1188L were positioned in diverse areas of the S protein.The fusion peptide area was well conserved because no AA mutation hotspots were discovered (Fig. 4A).
Except for the significant AA mutation changes in the spike protein, there were notable changes in the mutational spectra of other proteins as well.In comparison to the different AA mutational spectra, a huge number of repeats were observed in the ORF1b (T67I, 2571 times) and N (G204R, 1484; R203K, 1559 times) segments.There were more seven AA mutations found to be occurred in > 500 sequences such as I2230T (543 times), A1708D (547 times) and T1001I (551 times) in ORF1a, L83F (631 times) in ORF3a, F3L (530 times), D34G (532 times) and F120L (532 times) in ORF8 fragment of the SARS-CoV-2 genome (Fig. 4B).
With the march of time, more and more deleterious AA mutations are being detected among the genome sequences of SARSCoV-2 especially those sequenced from the deceased COVID-19 patients.The top frequent AA mutations of different proteins occurring in different continents have been listed to better understand the scenario of SARS-CoV-2 mutational tendency depending on the regional factor (Table 2).The AA mutations occurring in more than two continents are highlighted to focus on them.Interestingly, D614G mutation in spike protein and S26L mutation in OF3a protein were found to occur in the SARS-CoV-2 genomes in all continents.Another noteworthy AA mutation, A1918V, occurred in both Asia and North American regions whereas a slightly different mutation, A1818L found in the African region.However, the ORF8 and ORF9b fragments showed no similarity of mutational alignment across the regional barrier (Table 2).

Effects of mutations on protein functions
Finally, we considered the deleterious signature mutations to evaluate the changes in proteins as biological functions using PROVEAN, PolyPhen-2, and Predict SNP tools (Fig. S1).We found the highest PROVEAN score of − 13.22 in case of W45S and W45R deleterious mutations, and a minimum − 12.278 for the W45L mutation of the ORF8 gene (Fig. 5A).Interestingly, these three detrimental mutations were identified in the same ORF8 region of SARS-CoV-2 genomes using other tools.Using these three tools, G18V, W45S, I33T, P30L, and Q418H were identified as the frequent mutations which are responsible for defining each clade as they all are deleterious and unstable.Using Predict SNP, we simultaneously predicted the highest number of detrimental mutations (n = 1875) at Q57H in the ORF3a gene (Fig. 5B).Through the PolyPhen-2, we detected the highest number of deleterious mutations at D160Y (n = 1559) in the M gene and G204R (n = 1448) and D3L (n = 540) in the N gene.All these deleterious mutations had a PolyPhen-2 score of 1, whereas the sensitivity and specificity were 0 and 1, respectively.These findings indicate that differences in mutations in distinct regions will likely impact protein function.Top mutations against the Predict SNP score are visualized in Fig. 5C.The mutations occurred in the ORF8 segment such as W45L andW45S scored the most negative values according to Predict SNP prediction model where both scored less than -12.No other mutation of this segment or other proteins had scored such negative scores throughout the mutational spectra (Fig. 5C).

Discussion
Analyzing mutations in SARS-CoV-2 genomes presents a valuable opportunity to gain insights into how various genes undergo frequent changes that can influence viral characteristics and disease manifestation.One of these viral features, the transmissibility of the SARS-CoV-2 virus, is impacted by genetic variations within the virus genome.Despite these genetic variations, there's an ongoing need to develop new vaccines that can adapt to emerging variants 34 .In this study, we examined the mutational patterns in SARS-CoV-2 genomes from deceased COVID-19 patients across five continents.For the first time, we have identified both nucleotide (NT) and amino acid (AA) mutations, including both deleterious and non-deleterious changes, in the SARS-CoV-2 genomes of deceased COVID-19 patients on a global scale.We discuss the implications of these mutations for genomic surveillance and the management of this viral disease.Additionally, we present the epidemiological distribution of different variants and clades in the SARS-CoV-2 genomes of deceased patients using a robust phylogenetic approach.
Genome-wide variations in SARS-CoV-2 reveal evolution and transmission dynamics of several variants which are critical considerations for control and prevention of COVID-19.The SARS-CoV-2 pandemic has led to over 6.9 million deaths as of October 2023 (https:// www.world omete rs.info/ coron avirus/).We analyzed 5,724 full-length genomes of the SARS-CoV-2 (after thorough filtering of 243,270 genomes) sequenced from deceased COVID-19 patients of diverse demographics from January 2020 to February 2023, and detected some shared   www.nature.com/scientificreports/deleterious mutations associated with the morbidity of COVID-19 patients in different continents.Furthermore, we traced the spread of the variants worldwide and provided an insight into the evolutionary pressure in SARS-CoV-2 genomes associated with its virulence.
In the current study, we applied the open-source programs GISAID EpiCoV™ Database 35 and Nextstrain 36 .By comparing the death severity rates of SARS-CoV-2 infections in 123 countries and five continents across the globe, we found that the highest amount (> 33.0%) of SARS-CoV-2 genomes from deceased COVID-19 people were sequenced from North America followed by Europe (~ 29.0%),South America (~ 20%), and Africa (~ 18%).However, the lowest number of SARS-CoV-2 genomes from deceased COVID-19 patients were sequenced from Asian countries (< 1.0%).The country-wise distribution of these genomes in top ten countries also revealed significant discrimination showing that Brazil (> 16.0%), Mexico and USA (> 14.0%, each), Bulgaria (> 13.0%) and India (> 5.0%) are the countries from where higher number of genomes were sequenced from deceased COVID-19 people.However, the lowest amount of WGS was sequenced in one transcontinental North American country, the Panama (0.5%).In addition, male COVID-19 patients having an average age of 55 years had higher COVID-19 risk than female patients (average age of 70 years).The genetic differences among SARS-CoV-2 strains could be linked with their geographical distributions and COVID-19 severity.The findings of the present study are in line with several earlier researches that reported that the integration of geographical and climatic data with genetic mutation analysis promises to provide a fuller understanding of the origins, dispersal and dynamics of the evolving SARS-CoV-2 virus 2,5,37 .Since the emergence of the pandemic in China in December 2019, thousands of variants of SARS-CoV-2 have emerged 38,39 .The SARS-CoV-2 genomes sequenced from deceased people belonged to 21 Nextstrain clades comprising both VOCs (e.g., 20I, 20H, 20J.21A, 21I, and 21J) and VOIs (e.g., 21C, 21G and 21H).This virus is continuously mutating from its ancestral strain (e.g., hCoV-19/Wuhan/ WIV04/2019), which results in the upsurge of new variants.One of the interesting findings of this study is that clade 20I (Alpha variant) was the most predominating clade followed clade 20H (Beta variant) and clade 20J (Gamma variant) in the SARS-CoV-2 genomes of deceased COVID-19 patients.These results are consistent with previous studies that reported Alpha variant as the most prevailing VOC followed by Beta and Gamma VOCs 40,41 .Currently, multiple variants are circulating globally 39,41 .Recently, there have been five VOC that drew tremendous public attention due to increased transmissibility or virulence that may attenuate the effectiveness of current control measures, available diagnostics, vaccines, and therapeutics.SARS-CoV-2 is highly probable to mutate and evolve to enhance its infectivity and transmissibility, posing a severe risk of accumulation and dominance of immunologically relevant mutations across different lineages in the near future.Accumulation of single or multiple mutations at the RBD-ACE2 interface can lead to more deadly waves of COVID-19.A thorough inspection of hot-spot residues, genomic epidemiology, evolutionary history, and selective pressures can help to predict new mutations.
Through an in-depth investigation into the emergence and pursuit of VOCs, such as Alpha, Beta, Gamma, and Delta, and VOIs, such as Eta, and several Nextstrain clades, this study reports the emergence and spread different variants, clades and/or lineages of the SARS-CoV-2 in deceased COVID-19 people from across the globe.Our analysis showed that B.1.1.529(Omicron), B.1.617.2 (Delta), B.1.1.7 (Alpha), B.1.1.28(Brazilian variant), P.1 (Gamma), and B.1.1.519(Mexican variant) were the most prevalent variants/lineages in the genomes of the SARS-CoV-2 genomes of the deceased patients.One of the key findings of this study is the prevalence of the 'G' clade in the SARS-CoV-2 genomes of deceased COVID-19 patients in Asian, African, and North American regions.In contrast, most of the death cases in Europe and South America were associated with the 'GRY' and 'GR' clades, respectively.Due to a significant antigenic shift, the SARS-CoV-2 virus has shown extensive mutations that have resulted in some VOCs, including Alpha, Beta, Gamma, Delta, and Omicron.There are ample researches investigating the transformation in the SARS-CoV-2 genomes 2,42,43 .However, they do not specifically study the mutations in SARS-CoV-2 genome sequences retrieved from deceased COVID-19 individuals.
Out of 5,724 complete genomes of SARS-CoV-2 extracted from deceased COVID-19 patients, we detected 35,799 NT mutations, with the majority (31.8%) occurring on the spike (S) protein.The S gene underwent NT mutations at 32 sites including 14 and eight mutations in the NTD and RBD regions, respectively, and rest eight mutations were detected other sites of the S protein.In addition, ORF1b, N, ORF1a, and ORF8 fragment of the SARS-CoV-2 genome also underwent to NT mutations at different positions.A mutation is a natural aspect of viral reproduction during disease outbreaks, with RNA viruses exhibiting more mutations than DNA viruses 44 .Irrespective of their impact on viral viability, the mutations found in the genome of these viral progenitors will dominate the population.This interaction between natural selection and serendipity influences the evolution of viruses within individuals, groups, and regions 45,46 .The D614G sites were determined to have the most mutations changing NT patterns.In this study, the D614G mutation was predominant in deceased people in 10 countries, such as Brazil, Mexico, USA, India, Iran, China, the UK, France, Japan, and Germany, and five continents.The D614G mutation was observed to increase viral fitness and infectivity 47,48 .Several researchers have reported the functional significance of D614G in the S protein, linking its role to the increased pathogenesis of the virus 49 .The spike protein change at D614G was dominant throughout the world, with increased infectivity and transmission 37 .Moreover, mutations in the spike proteins concurrently elevate the viral attachment to ACE2 receptors in the cell surface of the host cell 50 .In addition, most countries evidenced D614G mutations compared to the other sites 47,48 .However, existing evidence is inconclusive about the sole effect of the D614G mutation on pathogenicity and fatality, as numerous issues, such as aging and comorbidity, play a crucial influence 44 .One of the important findings of this study is the estimation of predominating "G" clade in the Asian, African and North American regions while most of the death cases in Europe were registered with "GRY" clade.Different clades of the SARS-CoV2 have been identified during the pandemic.Some spread worldwide, while others quickly faded away.Identification of the circulation of clades and/or variants in a region/country/society is important for better understanding of circulation of different clades/variants, genetic diversity and mutations in all non-structural, structural and accessory genes 51 .
One of the standout findings of this study is the discovery of 4,150 AA mutations, encompassing both deleterious and non-deleterious changes, distributed across various segments of the SARS-CoV-2 genomes.These mutations exhibited significant variations among the genomes of the deceased individuals from five different continents.Most of the ORFs of the SARS-CoV-2 genomes possessed higher deleterious AA mutations than other segments of the genomes.Our findings confirmed that the D614G mutation in the spike protein and the S26L mutation in the ORF3a fragment were present in the SARS-CoV-2 genomes from the deceased patients of all continents.Our results corroborate with many of the previous researches which reported that the spike gene mutations account for most of the clinically influential VOCs while the ORF1a frame of the genome serves as a key region for NSP (non-structural proteins) mutations 2,52 .Since the first report of the D614G mutation in SARS-CoV-2 genomes 53 , other modifications have also been reported 2,23,48 .This finding put forward that these two mutations might have been associated with increased viral transmission and infectivity and have likely arisen independently in multiple regions, indicating that they may have a selective advantage.The persistence of these mutations across different populations underscores the importance of monitoring the mutational tendencies of the virus to inform public health responses and the development of effective treatments and vaccines.Although vaccination reduces symptomatic cases, hospitalizations, and mortality, the mutation of the virus and the emergence of new variants (e.g., VOIs or VOCs) limit the effectiveness of these vaccines 54 .One of the major concerns about these emerging mutations is that they could potentially lead to dangerous modification in the SARS-CoV-2 genome that would ultimately increase infection severity or a failure of the currently developed 'vaccines' effects.Moreover, the detected VOIs or VOCs may make vaccinated persons to be re-infected with new-fangled variants 54 .However, scientists across the globe are trying to develop and formulate new vaccines (or vaccine candidates) against the newly emerged VOIs or VOCs of SARS-CoV-2.The updated vaccines are not expected to prevent all cases of COVID-19, rather they may reduce severe illness, hospitalization, and death from infection 55 .
Another novel aspect of our study is that the findings of the present study identified the regional differences in the AA mutation patterns of the SARS-CoV-2.For instance, the A1918V mutation was identified in the SARS-CoV-2 genomes of COVID-19 deceased people of both Asian and North American regions, while a slightly different A1818L mutation was present only in African region suggesting that different population genetics and transmission patterns may be driving these differences.These regions of the virus may be more conserved due to functional constraints, and mutations in these regions may have a greater impact on the viral pathogenesis.Remarkably, we found that there was a lack of similarity in mutational spectra between the ORF8 and ORF9b fragments of the virus across the regional barrier.These regions of the SARS-CoV-2 genome might be more conserved due to functional constraints, and mutations in these regions may have a greater impact on the virus pathogenesis.Other common mutations identified in this study included R203K and G204R, containing two AA alterations in N protein due to a trinucleotide substitution.R203K was detected primarily in samples from Mexico, Iran, the United Kingdom, and other regions across North America and Europe.On the other hand, AA mutation G204R was found predominantly in samples from India, Spain, France, and Germany and across the continents of Asia, Africa, and Europe.The AA changes in R203K and G204R were anticipated to reduce protein stability.This finding is line with other researches that identified R203K and G204R, which destabilize the structure of the N protein while improving interaction with the Envelope protein to enhance viral release 5,56 .The N protein contributes to the creation of spiral ribonucleoproteins during RNA genome packing, thus influencing viral reproduction and altering the homeostasis of infected individuals 57 .Modifications in the viral N protein structure enhanced its replication, pathogenicity, and adaptability 56 .We also found that AA changes patterns are different throughout the SARS-CoV-2 genomes.The high proportion of deleterious AA mutations were predicted in the open-reading frames than other segments.The high morbidity and mortality of immune-compromised patients infected by SARS-CoV-2 might be correlated with the disastrous consequences of these deleterious AA mutations 2 .In addition, several deleterious mutations manifest as geographic patterns alluding to the virus ability to modify/adapt itself within a distinct microenvironment.To acquire comprehension of these mutation patterns, it is important to have insights into their respective mutation frequency across both local and global patterns and help preempt the emergence of VOIs and VOCs especially in COVID-19 deceased patients.The COVID-19 pandemic with these VOIs and VOCs have exacerbated inequality within and between countries, eroded global solidarity and trust, and caused dramatic backsliding on key health outcomes due to disrupted access to essential health services.SARS-CoV-2 carrying several AA mutations throughout the genome results in infectivity of the COVID-19 disease in humans and thus accumulating mutations over time has become a public health concern globally 33,58 .The findings of this study emphasize that with each AA mutation in the structural segment of SARS-CoV-2, the evolved variants of SARS-CoV-2 might have acted differently and became more lethal leading to an increased in case fatalities.Moreover, it is a realistic possibility that over time VOIs and VOCs of SARS-CoV-2 have emerged, creating an alarming situation globally that necessitated the urgency of developing a vaccine to mitigate the SARS-CoV-2 impact on public health, the economy and society.
It's important to note that this study has some limitations, primarily stemming from the relatively small sample size and the potential for sampling bias.The study was constrained by the availability of a limited number of complete SARS-CoV-2 genomes, with approximately 5700 genomes obtained from deceased COVID-19 patients around the world.Additionally, there was a lack of data normalization.These limitations are partly due to the fact that not all countries worldwide sequenced SARS-CoV-2 genomes from all deceased COVID-19 cases or uploaded them to public databases like GISAID.For instance, the top five contributing countries in this study were Brazil (n = 456), USA (n = 419), Bulgaria (n = 361), India (n = 175), and France (n = 115).Furthermore, as the number of SARS-CoV-2 genomes evolves over time, we focused on identifying point-specific mutations in the SARS-CoV-2 genomes of deceased COVID-19 patients.Therefore, the mutation patterns should be considered as approximate findings.It's also worth noting that due to variability in different normalization methods, it's not appropriate to directly compare data between two or more structural segments of the SARS-CoV-2 genome, as Analysis of Protein Polymorphism (MAPP; http:// www.ngrl.org.uk/ Manch ester/ page/ mapp-multi varia te-analy sis-prote in-polym orphi sm.html).SIFT is a standalone version of the web server that predicts the potential impact of AA substitutions on protein function 62 .The PolyPhen-2 is a tool that predicts the possible impact of AA substitutions on the human protein structure and function using structural and comparative evolutionary considerations 63 .The mutation score for PolyPhen-2 is between 1 (harmful) and 0 (neutral) 63 .Likewise, SNAP2 is a bioinformatics tool that uses the annotations from the protein mutant database (PMD) to predict the changes due to the non-synonymous single nucleotide polymorphisms (nsSNPs) on protein function 64 .The PROVEAN tool generates a score for each mutation to predict its impact on host cells, whether harmful or neutral.The mutation score exceeding the default threshold of -2.5 suggests a neutral effect, whereas a score below the threshold indicates a detrimental effect 65 .The MAPP method predicts the deleteriousness of non-synonymous SNPs through an alignment of interspecific sequences that are putatively orthologous to the protein of interest 66 .To predict the overall effect of AA mutations on protein function (topology and flexibility), PredictSNP 67 , Poly-Phen-2 63 , and PROVEAN 65 were utilized.PredictSNP is a consensus classifier combining the datasets from SIFT, PolyPhen-2, SNAP, PROVEAN and MAPP.This consensus classifier gives significantly improved and accurate predictions over the individual tool.Deleterious mutations are critically analyzed and cross-checked using these tools.A threshold value of − 2.5 was determined to ensure highly balanced accuracy in defining the deleterious mutation.Therefore, mutations having a value smaller than − 2.5 were identified as deleterious 33 .The specificity and sensitivity values allude to confidence in the prediction (Fig. S1).

Statistical analysis
The statistical analysis was performed using the R statistical environment package v 3.6.3(https:// cran.r-project.org/ bin/ windo ws/ base/ old/3.6.3/).Categorical variables were expressed as absolute frequency and percentages.The different single nucleotide variation (SNV) counts per genome between different countries and/or continent was analyzed using one-way ANOVA followed by the Chi-square test.The Chi-square test assessed the relationship between each mutation and patient status.False discovery rate (FDR) was calculated using the Benjamini-Hochberg method to accommodate multiple hypothesis testing, and only results exceeding an FDR cut-off value of 5% were considered significant.All p-values were calculated from two-sided tests using 0.05 as the significance level.

Figure 1 .
Figure 1.Retrieved SARS-COV-2 whole genome sequences (WGS) obtained from deceased COVID=19 patients worldwide from the global initiative on sharing all influenza database (GISAID).(A) SARS-CoV-2 genomes submitted to the GISAID from five continents of the globe between January 2020 and February 2023.(B) Distribution of sequences throughout several lineages during the time span.(C) SARS-CoV-2 genomes sequenced by different countries from deceased patients.(D) Gender-wise (male and female) and (E) age-wise distribution of the selected sequences.(F) Distribution of different clades in five continents.(G) Cumulative number of strains retrieved during the time frame from five continents.

Figure 2 .
Figure 2. Phylogenetic analyses of the 5,724 SARS-CoV-2 genomes sequenced from the COVID-19 deceased patients worldwide.(A) A detailed phylogenetic tree presenting all the significant clades associated with deceased COVID-19 patients.(B) Value of the entropy change (distribution of mutational frequency overall the SARS-CoV-2 genome) throughout the SARS-CoV-2 genome based on mutation count for each position.The maximum-likelihood tree was generated using Nextclade Web 2.14.1 web-based tool (https:// clades.nexts train.org/ resul ts; accessed on October 10, 2023).

Figure 3 .
Figure 3.The frequency of nucleotide (NT) mutations found throughout the SARS-CoV-2 genomes of the deceased COVID-19 patients.(A) The number of conversions respective to specific genes or segments of the SARS-CoV-2 genome.(B) The maximum frequency of NT mutations in particular region of the SARS-CoV-2 genome.In both cases, specific gene regions were colored with frequency.

Figure 5 .
Figure 5. Scores of different mutations throughout the SARS-CoV-2 genomes sequenced from deceased COVID=19 patients.(A) Top hundred mutations predicted by PROVEAN tool.(B) Total frequency of the top mutations predicted by Predict SNP tool.(C) Prediction of deleterious mutations by Predict SNP.

Table 1 .
The nucleotide (nNT) mutations with the highest frequency predicted at various loci of SARS-CoV-2 genomes extracted from deceased COVID-19 patients of different countries.

Table 2 .
Most frequent amino acid (AA) mutations predicted at various loci of SARS-CoV-2 genome obtained from deceased COVID-19 patients of the five continents.Amino acid (AA) mutations occurred more than two continents are highlighted[bold].